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ABSTRACT 

Despite the discovery of thousands of exoplanets, no exomoons have been detected so far. We test 
a recently developed method for exomoon search, the orbital sampling effect (OSE), using the full 
exoplanet photometry from the Kepler Space Telescope. The OSE is applied to phase-folded transits, 
for which we present a framework to detect false positives, and discuss four candidates which pass 
several of our tests. Using numerical simulations, we inject exomoon signals into real Kepler data 
and retrieve them, showing that under favorable conditions, exomoons can be found with Kepler and 
the OSE method. In addition, we super-stack a large sample of Kepler planets to search for the 
average exomoon OSE and the accompanying increase in noise, the scatter peak. We find a significant 
OSE-like signal, which might indicate the presence of moons, for planets with 35d < P < 80d, having 
an average dip per planet of 6 ± 2ppm, corresponding to a moon radius of 2120jllfgkm for the average 
star radius of 1.24i? 0 in this sample. 

Subject headings: planets and satellites: detection 


1. INTRODUCTION 

Our own solar system hosts 8 planets and 16 large 
(> 1000km) orbiting moons. Although more than 1,500 
exoplanets have been confirmed to date, no exomoon has 
been detected. The question of the existence of exomoons 
is interesting, as it will sharpen our understanding of how 
“normal” our own solar system is in the interstellar con¬ 
text. The frequency of exomoons can also guide plan¬ 
etary formation theory. Lastly, some exomoons might 
even be habitable (Heller & Barnes 2013; Heller et al. 
2014; Forgan & Yotov 2014; Hinkel & Kane 2013). 

Various search methods have been proposed (see e.g. 
Heller (2014), and references therein). In what follows, 
we focus on two photometric methods, dubbed the “or¬ 
bital sampling effect” (OSE, Heller (2014)), and the 
“scatter peak” (SP, Simon et al. (2012)). We will 
show that both complement each other and that Ke- 
pler-quality photometry is sufficient to detect individual 
OSE-like signals. 

2. METHOD 

Planet transits are generally stacked to increase the 
signal-to-noise ratio. The OSE and SP also make use of 
stacking: After adding up many randomly sampled ob¬ 
servations, a photometric flux loss appears in the phase- 
folded transit light curve, reflecting the moon’s blocking 
of light. The effect depends mostly on the moon’s radius 
and planetary distance. The OSE can be used to detect 
a significant flux loss before, during and after the actual 
transit (if present), which might be indicative of an ex¬ 
omoon in transit. The basic idea is that at any given 
transit the moon(s) must be somewhere: They might 
transit before the planet, after the planet, or not at all 

depending on the orbit configuration (Sato & Asada 
2009). But by stacking many such transits, one gets, 
on average, a flux loss before, during and after the ex¬ 
oplanet transit (see Figure 3 in Heller (2014)). The 
depth of this flux loss then reflects the total radii and or¬ 
bits of the exomoon(s) (Figure 1), assuming high enough 
data quality. This is a simplified sketch, as it shows only 
a single-moon configuration. In multiple-moon configu¬ 
rations, an additional feature might be added through 
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Fig. 1.— Numerical simulation of the OSE in a phase-folded 
plot. The circular moon orbit (top) causes higher sampling at times 
away from planetary mid-transit. Each dot on the orbit triggered 
one transit in our simulation, at the given time, over plotted in 
the middle. When averaging the flux over time, the OSE emerges 
(bottom). Just before (and after) planetary transit, the OSE curve 
shows a slightly weaker flux decrease, which can be an indication 
of real exomoon presence in data, if detected. In this example, the 
moon orbit is circular and has zero eccentricity and inclination, 
resulting in a mirror-symmetrical configuration. The jitter in the 
OSE curve is due to the sampling of “only” 40 transits. 

mutual moon eclipses. 

A varying fraction of the flux loss occurs during plane¬ 
tary transit, as will be discussed in Section 4.5. Unfortu¬ 
nately, it is very difficult to measure these “lost photons”. 
The effect is very similar to a slight increase in planetary 
radius (transit depth). Furthermore, limb darkening is 
an unknown parameter and needs to be estimated. Both 
factors, limb darkening and the unknown planetary ra¬ 
dius, easily mask the flux loss during planetary transit. 
For example, in our Sun-Earth-Moon configuration, the 
Earth produces an 84ppm dip, and the Moon OSE con¬ 
tributes ~3ppm (of 6ppm transit depth) during Earth 
transit. Instead of a planetary radius of 6378km, one 
might simply mistake the combined effect for a planet 
of radius 6490km, a 1.8% difference in planetary radius. 
While the change in transit depth itself might be border¬ 
line detectable in favorable cases, one would instead need 
to distinguish the transit shapes between the two scenar- 
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Fig. 2.— Simplified sketch of the SP effect in a phase-folded plot 
showing a stack of 3 epochs. During one epoch, no exomoon transit 
occurred (straight line at flux=l). During the other two epochs, 
exomoon transits occurred before planetary ingress, centered at - 
0.6d and -0.5d. Data points with noise (dots) show higher scatter 
during this phase-folded time. After stacking many transits, the 
scatter peak would be sampled as described by the photometric 
OSE. For clarity, the noise shown here is much lower (by a factor 
of 10) than ideal Kepler noise. 

ios. This will require further modeling (including the ef¬ 
fect of limb darkening uncertainties), which we strongly 
encourage, but it is beyond the scope of this paper. 

On the negative side, the OSE can only detect a sub¬ 
sample of all moon configurations: It requires the moon 
to be in a near sky-coplanar orbit to allow for the moon 
transiting the star (Sato & Asada 2010). Among the 
large solar system moons, 4 out of 16 have inclinations 
over 1°: Earth’s moon (5.5°), Triton (130°), Iapetus 
(17°) and Charon (120°). If this rate is typical, then 
the OSE would work for the other ~75% of cases. 

While the presence of a significant flux loss might be 
an indication of a transiting body, it could also be caused 
by other sources, such as imperfect detrending of data, 
or time-correlated (“red”) stellar noise. It is therefore 
beneficial to have an (almost) independent method at 
hand for vetting purposes. The SP method is based on 
the fact that the geometrical exomoon configuration is 
very likely different during every exoplanet transit: On 
some transits, the moon might be ahead of the planet, 
on other transits behind it. When stacking many tran¬ 
sits, at a given phase folded time, one gets a flux loss 
in some cases, and not in others. This results in in¬ 
creased scatter (photometric noise) when compared to 
out-of-transit times (Figure 2). It is important to note 
that the OSE is only sensitive to flux loss (and not the 
scatter), and the SP includes a median-filter to subtract 
the actual flux loss (if any), and is thus only sensitive to 
the scatter. This makes both methods independent and 
complementary. Future theoretical advantages might al¬ 
low simulations which take both effects into account by 
using Monte-Carlo runs to simulate various planet-moon 
configurations and the resulting OSE+SP, and providing 
a best-fit model for the given data, taking into account 
both flux loss and the corresponding increase in variance 
(Rene Heller (2014), priv. comm.). 

2.1. Maximizing exomoon detection efficiency 


A literature search has brought up only one example 
for an application of the OSE to real data (Heller et al. 
2014); a null result for KOI189.01 1 . For the SP, no ap¬ 
plication could be found. Thus, no significant experience 
on potential error sources is available. In the following, 
we have tried to account for all relevant potholes, but (as 
is always the case), there might be further tests required. 
We consider the methods and tests in this work as not 
matured enough to claim any exomoon detection. We 
will present our test results using several candidates as 
examples. The advantage of our approach is that only 
minimal CPU time and human labor (< lhr per candi¬ 
date) is required, after the required tests are established 
and implemented in software. Then, our framework can 
select candidates from a large sample, and suggest these 
for an in-depth analysis using photodynamic modeling as 
established by Kipping (2011). This modeling requires 
~50yrs of modern CPU time for a single moon, plus con¬ 
siderable human effort to assess the Bayesian evidence of 
the result. By pre-selecting promising candidates for the 
modeling, the total output (per human labor time and 
per CPU-year) of exomoon detections can be maximized. 

2.2. Data preparation 

We employed the largest database available: High pre¬ 
cision time-series photometry from the Kepler space¬ 
craft, covering 4 years of observations (Caldwell at al. 

2010) . Based on a list of all confirmed (821) and uncon¬ 
firmed (3,359) transiting Kepler planets (Wright et al. 

2011) 2 , we downloaded their Kepler long-cadence (LC, 
30min) datasets. For the OSE, LC data is equally well 
suited to SC (short cadence, lmin) data, because the ef¬ 
fect is spread over a long (»30min) time. The stacking 
of many transits and binning makes it irrelevant where 
exactly the “missing photons” are placed. For the SP 
however, SC data is preferable, due to the smearing ef¬ 
fect (Simon et al. 2012). The background is that, on 
average, half of the flux loss during ingress and egress is 
shared with nominal flux, if the bin size is too large. The 
subsequent scatter measure is then less sensitive. 

As LC data only requires 1/30 of the SC space and pro¬ 
cessing time, we have only processed SC data for those 
candidates that triggered an OSE. The bluring effect also 
causes additional errors on a planet (and to a lesser ex¬ 
tent moon) fit; but the errors are small and most relevant 
for short transit durations. Usually they cause an error 
of ~0.01% for the planetary radius, much smaller than 
the formal errors from an adjusted fit (Gilliland et al. 
2010). 

2.3. Automatic Data processing 

The detrending for our fully automatic routines was 
very conservative by only fitting a line to each Kepler 
quarter and star (while masking times of transits), and 
normalizing the pre-search data conditioning (PDCSAP) 
(Smith et al. 2012) flux. Only for interesting can¬ 
didates we have manually detrended the raw SAP flux 
data, to check for potential PDC conversion errors. The 

1 After publication of the work by Heller et al. (2014), the 
system was shown to be an eclipsing stellar binary, rather than a 
planet, by Diaz et al. (2014). 

2 www.http://exoplanets.org, list retrieved on 18-Nov 2014 
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result was visually inspected, and all those stars deviat¬ 
ing from a flat baseline were directly rejected from further 
analysis. This decreased the number of candidates, but 
avoided complicated (and often ambiguous) detrending 
of e.g. (often partially) variable stars, or complicated 
instrumental trends. 

For the remaining 2,328 exoplanets, phase-folds and 
scatter measures were calculated in 10 bins before and 
after the transits. The bin width was 0.5 transit lengths, 
with the exception of the two bins directly before/after 
the planet transit, which were shortened to avoid con¬ 
tamination from transit timing or transit duration vari¬ 
ations. 

We will present the results from processing all planets 
in section 3. Afterwards, we will discuss the framework of 
tests for false positives (section 4) and apply this toolbox 
to example candidates in section 5. 

3. RESULTS FOR PROCESSING ALL PLANETS 

We created a combined phase-fold of all useful data, 
to search for an average exomoon effect. For the 2,328 
transiting exoplanets with good data quality, the tran¬ 
sit duration was transformed to unity, and the flux was 
measured in 10 bins before and 10 bins after the phase- 
folded transits. Then, we added a buffer of 10% of the 
transit duration both before and after the transits to be 
excluded from any further analysis, so that any transit 
duration uncertainty, or transit timing/duration varia¬ 
tions are also excluded from mismatch with exomoon flux 
loss. By visually inspecting all phase-folds, we removed 
several candidates that showed excessive TTVs/TDVs. 
We find that planets which exhibit >10% TTVs/TDVs 
(more than our buffers) are visually easy to identify; it 
might however be a valid approach in future processing 
to calculate individual TTVs/TDVs for each planet (and 
transit), and adjust the stack accordingly. 

We tried normalizing the transit depths, but found this 
only added noise. 

To sum up, all transits of all planets were transformed 
in width to one big meta phase-fold (“super-stack”). We 
have then examined the flux of this before and after the 
actual transit, in order to search for flux loss caused by 
exomoons, as described by the orbital sampling effect. 

3.1. Data cleansing 

The resulting data were strongly dominated by out¬ 
liers. This is understandable as many factors contribute 
to different transit depths, and noise levels: The size 
and apparent brightness of the host star, the size of the 
planet, stellar variability on a scale lower than rejected 
by our visual inspection, instrumental differences, and 
others. We decided on a set of filters to remove outliers, 
and included the stellar brightness (< 13mag in J as 
measured by 2MASS), the scatter per star (we kept the 
better half), and the data completeness (we required 80% 
of the Kepler spacecraft duty cycle). Accepting all stellar 
brightnesses, and all data completeness levels does only 
slightly deteriorate the results. This is a trade-off hard 
to quantify, as one adds both signal (more stars) and 
noise (lower quality stars). We found, however, that the 
scatter of the individual lightcurve is of great influence. 
Our results are also consistent for the case of removing 
only the few very strongest outliers, so that we consider 
the data to be robust. 


It is important to mention a selection effect from 
this filter choice: When rejecting dimmer and/or noisier 
stars, the average stellar radius changes. Smaller stars 
(e.g. M-dwarfs) exhibit more stellar noise (Basri et al. 
2013) and are usually less luminous. Consequently, our 
sample is shifted towards larger radii. While our sample 
has an average stellar radius of 1.24f? 0 , the sample of 
planetary periods 35-80d, without further filtering, has 
an average of l.lli?Q. In the future, more refined exo- 
rnoon studies might take this into account. 

3.2. Results for the “super-stack” 

We find a strong correlation of flux loss to the plane¬ 
tary orbital period. Figure 3 shows our stacked flux for 
652 planets with periods < 10d (left) and for 177 plan¬ 
ets with periods 35d < P < 80 d (right). Short-period 
planets are not expected to possess moons (Cassidy et 
al. 2009), so that this group can be used as a reference 
point. However, a significant flux-loss just before and 
after the planetary transit is also apparent in the < lOd 
group (and most prominent in the very shortest period 
planets), perhaps due to a slight leak of transit flux loss 
into the neighboring bins, despite our buffers. Indeed, 
when neglecting the bins closest to the planetary transit, 
the dips are gone. 

When plotting only the 177 longer-period (35 d < P < 
80d) planets, an additional flux loss appears, starting 
and ending at ~ 2 planetary transit durations. As will 
be explained in section 5.9.1, at least 15-20 transits are 
required before the OSE can build up in a single case. 
When sampling many longer-period (> 80d) planets, a 
similar flux loss should appear, which we do not detect. 
We attribute this to our sample size: While we have a 
total of 4,631 transits (from 176 planets) with periods 
35d < P < 80 d, we have only 727 transits from 77 plan¬ 
ets in the group for longer-period planets. The smaller 
number is caused by two reasons: First, Kepler is less 
sensitive in detecting longer period planets (at a given 
planetary radius), as less transits can be stacked. This 
reduces the sample sizes. From this reduced sample, only 
224 planets survived our visual rejection of excessively 
noisy candidates, as discussed in the previous section. 
Finally, when applying the formal criteria (brightness, 
jitter, data completeness), only 77 remain. This lower 
number of only linearly detrended light curves might not 
be sufficient for the detection of the OSE in this group. 
We encourage future studies to try different selection cri¬ 
teria, and detrending methods, in order to explore the 
parameter space of flux loss for all planetary periods. 

For the planets with periods of 35d < P < 80d, the 
effect is measured as an average flux loss of 6 ± 2 x 1CD 6 
(6ppm), corresponding to a moon radius of 2120 /l 37 Qkm 
for the average (1.24i? 0 ) star in this sample. Also, a 
significant scatter peak is detected, with the bins affected 
from the OSE having a scatter of 2.7±0.2 ppm, compared 
to the baseline of 1.9ppm. The detection of both effects, 
the OSE-like signal and the SP, may indicate exomoons 
as the underlying cause. 

4. FRAMEWORK FOR INDIVIDUAL CANDIDATES 

A simple dip in the data is necessary, but not nec¬ 
essarily sufficient for the detection of an exomoon. As 
will be explained in section 4.4, we have found 4 out of 
56 planets to exhibit such a dip at the 95% significance 
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Fig. 3.— Top left: Stacked phase fold for periods < lOd. Flux loss is only apparent very close to the planetary transit. Top right: Periods 
35d..80d with flux loss in neighbouring bins. Dashed line shows a symmetrical linear regression. Middle left: Scatter peak for periods 
35d..80d is clearly detected, here shown in symmetrical form. Middle right: Cumulative flux loss (positive values) depends strongly on the 
planetary period. The exomoon effect from OSE is visible for P> 35 d (vertical line), exceeding the uncertainty (grey shading) for P> 40 d. 
Bottom left: Planetary period versus flux loss/gain. While any flux gain is likely caused by noise, it shows the high overall scatter of this 
work. The average (linear trend as dotted line) shows significant (p = 0.048) flux loss. Kepler-264b is marked with a cross. Bottom right: 
Histogram of planets between 35 and 80d periods. Of the 176 planets, 101 show a flux loss (57%). While many of these might be false 
positives (also on the plus side), the distribution is already significantly skewed in a binomial test (p = 0.029). 
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level, likely not caused by TTVs/TDVs. One method 
would be to refer these candidates to photodynamical 
modeling, but this is very expensive in terms of labor 
and CPU-time. Therefore, we have developed a frame¬ 
work of tests for false-positives. This can either reduce 
the number of candidates, or at least rank and prioritize 
them for further analysis. 

4.1. Required results from the OSE 

For the presence of a moon, we propose the following 
criteria to be checked: 


OSE1 A significant (p = 95%) flux loss before and after 
the planetary transit. 

OSE2 A significant (p = 95%) trend towards more flux 
loss at closer to the transit than further away. 

OSE3 A significant (p = 95%) flux loss shall not appear 
when only a small number of transits (n < 10) is 
used. 

OSE4 The dips before ingress and after egress shall be the 
largest dips in the dataset. 

OSE5 The dips should not be caused by star-spots. 

OSE6 The dips should not be caused by rings. 

OSE7 The amount of flux loss needs to be physically plau¬ 
sible. 

Cl The signal shall not be located in a small part of 
the data only, but be contained in > 50% of it. 


The choice of significance level, p = 95% (~2cr) can of 
course be varied, depending on the data quality and the 
capacity for follow-up analysis. To calculate this signifi¬ 
cance, our null hypothesis is a normalized nominal flux. 
We then perform a two-sample mean-comparison test 
with the potential flux loss data as one group, and the 
reference data (long-out-of-transit) in the other group: 


t = 


(a-’i - x 2 ) - A 


( 1 ) 


I Q2 

ni ' 712 


where X\ and x 2 are the means of the two samples, 
A is the hypothesized difference between the population 
means, Si and s 2 are the standard deviations and n\ and 
n 2 are the sizes of the two samples. 

This mean-comparison test gives the two-sample t 
statistic, for which we calculate the p-values using the 
t(k) distribution. The test slightly underestimates the 
OSE significance, as it does not account for the non¬ 
linearity in slope of the OSE. 

We have considered a similar measure, the signal-to- 
noise ratio for transits (Jenkins et al. 2002; Rowe et al. 

2014), which compares the depth of the transit mode 
compared to the out-of-transit noise: 


S/N = (2) 

(jot 

with Nt as the number of transit observations, Td ep 
the transit depth and (Jot the standard deviation of out- 
of-transit observations. We find that both measures give 


similar results, and choose to continue giving significance 
levels (p-values) in the following. For future studies, we 
encourage the creation of a significance measure which 
fully accounts for the non-linear curve shape of the OSE. 

When not noted otherwise, the bin length for the out- 
of-transit group was taken to be one half transit duration 
before, and after planetary ingress (egress). 

Planets on very short orbits are increasingly unlikely to 
host moons, due to mass loss (Cassidy et al. 2009), very 
small Hill radii, and orbital decay due to tides (Barnes 
& O’Brien 2002). As the range of stellar and planetary 
temperatures, masses and radii is large, a single limit 
cannot be given; estimates are in the range of 5-15 days 
for the minimum orbital period of the host planet. For 
long-period planets, there are not enough transits in the 
data for the build-up of the OSE. For a minimum of 20 
transits within the 4 years of Kepler observations, the 
maximum period for possible detection is then 73 days. 
Taking only planets with periods of 15-73 days, we have 
311 confirmed planets plus 958 “Kepler Objects of Inter¬ 
est” (KOIs). Using the traditional 95% level significance 
level, and detecting a dip both before and after plane¬ 
tary transit, results in a total false positive probability 
(per candidate) of 2 x (1 — 0.95) = 0.0025. Thus, one 
can assume a total of 0.0025 x 1269 = 3 false positives, 
a number which can further be reduced with the other 
tests in our framework. Therefore, we suggest to be re¬ 
laxed towards the requirements and also accept partially 
negative test results, especially as the tests sometimes 
yield false negative results. 

Regarding OSE1, it is useful to treat ingress and egress 
separately, and also binned together (section 5.2). As 
shown in Figure 1, the dip has a defined shape, most 
strongly dependent on the number of moons, their radii 
and semi-major axes. As these parameters can only 
be estimated, we suggest to perform a test (OSE2, sec¬ 
tion 5.3) for the slope: The flux loss should be greater 
towards planetary transit, than further away from it. We 
use a linear regression to test for a slope, knowing that 
it only roughly approximates the true OSE curve. To 
quantify the result of the linear regression, we again give 
p-values to judge the rejection of our null hypothesis of 
a constant flux (zero slope parameter). 

Furthermore, such a dip emerges only after sampling a 
sufficient number of transits. This can be tested (OSE3, 
section 5.4), together with numerical simulations for the 
number of transits required. In this section, we will se¬ 
lect (many) randomly chosen subsection of the data, and 
check if they are consistent (within the errors) with the 
total data, so that most, and not just a few data segments 
contain the potential exomoon signal. 

Then, when the dips have been established as genuine, 
we need to make sure that such is not common within 
the dataset: The two dips should be, ideally, the deepest 
in the whole dataset (OSE4, section 5.5). 

After this step, in case the dips could not be explained 
as stellar or instrumental noise, we shall consider a phys¬ 
ical cause. Dips can be caused by star spots (OSE5, 
section 5.6) or, more interestingly, by rings (OSE6, sec¬ 
tion 5.7). We will explain our methods in detail in these 
sections. 

If these can be ruled out (or appear unlikely), we shall 
seriously estimate the actual star-planet-moon configu¬ 
ration and its stability, to check whether the system 
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is physically possible, plausible and stable (OSE6, sec¬ 
tion 5.8). These tests include the discussion of long-time 
stability, and an orbit within the Hill-radius and outside 
the Roche-lobe. 

Inspired by Kipping et al. (2014), we also employ a 
useful data consistency check, demanding that the signal 
should be spread broadly in the data, and not be located 
in only a part of it (Cl, section 5.9.1). 

4.2. Required results from the SP 

Equally, we define the following criteria for the detec¬ 
tion of a scatter peak: 

SP1 A significant (p = 95%) scatter peak before, at and 
after the transit. 

SP2 Shifting the period shall not significantly decrease 
the scatter peak. 

SP3 Accounting for TTVs and TDVs. 

SP4 The strength of the scatter peak shall be astrophys- 
ically plausible, and scale with the amount of data 
used. 

C2 The signal shall not be located in a small part of 
the data only, but be contained in > 50% of it. 

While SP1 is self-explanatory (section 5.10), SP2 and 
SP3 (section 5.11) are more complicated as the transit 
folding might not be perfect due to uncertainties, TTVs 
and TDVs. Lastly, as for the OSE, the amount of scat¬ 
ter should correspond to the estimated star-planet-moon 
configuration, and it should be be spread over all of the 
data (section 5.12). 

4.3. Manual detrending and baselining 

The initial automatic search process had only em¬ 
ployed a linear detrending and normalization, which is 
sufficient to reject the vast majority of negative candi¬ 
dates. For the remaining interesting candidates, how¬ 
ever, great care must be taken regarding detrending and 
baselining. Usually, the baseline used for transits is the 
time immediately before and after the transit. This 
method, by definition, renders the orbital sampling ef¬ 
fect invisible. Instead, we need to derive the baselines 
from times long (> 2 transit durations) before and after 
the actual transits. The detrending must also protect 
not only the times of transit, but also at least two tran¬ 
sit duration before and after the transit, in order not 
to smooth out any flux loss. Wrongly detrending right 
though the (moon and/or planet) transit dips destroys 
the OSE: The fitted curve goes deeper (as influenced by 
the dip), and thus the resulting detrended data is higher 
in flux, so that any flux loss from (averaged, stacked) 
OSE is lost. 

We examined both the raw Kepler data (Simple Aper¬ 
ture Photometry, SAP) and the Presearch Data Condi¬ 
tioning (PDC) result of the Kepler data analysis pipeline. 
The latter tries to remove discontinuities, outliers, sys¬ 
tematic trends and other instrumental signatures, while 
preserving planet transits and other astrophysical sig¬ 
nals. It uses a Bayesian Maximum A Posteriori (MAP) 
approach “where a subset of highly correlated and quiet 
stars is used to generate a cotrending basis vector set 


which is in turn used to establish a range of reasonable 
robust fit parameters.” (Smith et al. 2012). 

For the SAP data, we removed outliers with a sliding 
median and a 3<r filter. Afterwards, we tried two com¬ 
mon detrending methods with both data sources. First, 
we subtracted a least-squares parabola fit to an appro¬ 
priate time before and after mid-transit (e.g. Santerne et 
al. (2014); Gautier et al. (2012)). For Kepler-264b, we 
used 3 days before and after. Alternatively, we applied 
a sliding median to the whole dataset (e.g. Carter at al. 

(2012); Tal-Or et al. (2013)). For both methods, we 
removed data points affected by other transiting plan¬ 
ets in the system, if present. Also, to protect the very 
slight putative moon flux loss, we blinded the detrend¬ 
ing algorithm for the time span of 1.5 transit durations 
from mid-transit, so that it could not remove such a dip. 
For Kepler-264b and the median detrending, a boxcar 
length of 2-3 days was optimal, as it steps over the blind 
period and still adapts to trends on the order of a few 
days. For best results, these times need individual ad¬ 
justments for each candidate. It is important to ensure 
that an adequate range of detrending parameters gives 
similar results, in order to qualify the chosen detrending 
methods and parameters as robust. A useful cross-check 
is to slowly shrink the protected flux area, and/or shorten 
the boxcar length of a sliding median detrending. As a 
consequence, the OSE must vanish, as the detrending 
adapts more and more to putative slight moon dips, and 
removes them. The choice of the size and duration of the 
boxcar is also a crucial aspect for the detection of a SP 
signal. For a detailed discussion, we refer to Simon et al. 

(2012) (their section 4.2). 

We find that the results are consistent with SAP and 
PDC-SAP data, but not always for parabola (polyno¬ 
mial) and median detrending, as listed in Table 1. The 
PDC-SAP data gives less noise, as is expected due to the 
additional co-trending vectors which effectively remove 
instrumental trends. Most of the noise in PDC-SAP is 
likely due to astroseismic jitter - section 5.6 will discuss 
the (likely) rotational period of e.g. 11.Id for Kepler- 
264, accounting for an amplitude of 81ppm with strong 
sidelobes. 

For reference, we show the detrending results for PDC- 
SAP data and both detrending methods for Kepler-264b 
in Figure 4. The results differ only slightly. Unfortu¬ 
nately, these detrending methods cannot be used for the 
scatter peak, as will be discussed in section 5.10. We 
list the range of adequate detrending parameters in Ta¬ 
ble 1. We also release an interactive spreadsheet 3 for the 
interested reader. In this sheet, masking times and box¬ 
car length can be modified, and the resulting flux loss is 
displayed instantly. The accompanying website provides 
an introduction to the parameters, and explains possible 
conclusions. 

4.4. Sensitivity 

We have performed a deep analysis for a sample of 
56 planets, in order to derive limits for the sensitivity 
to an exomoon with respect to the stellar radius, stellar 
brightness and host planet size. We included interesting 
suitable examples from previous searches (Kipping et al. 

2013a, b; Heller et al. 2014; Kipping et al. 2015a) 

3 http://jaekle.info/ose/detrending.html 
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TABLE 1 

Detrending parameters for the candidates 


Parameter 

241b 

241c 

264b 

367.01 

Median boxcar [days] 

1.0 - 2.0 

1.0 - 2.0 

2.0 - 3.0 

0.2 - 1.0 

Masking [% of transit durations] 

150 - 300 

150 - 300 

150-200 

150 - 300 

Visual fit quality 

Very good 

Very good 

Bad 

Excellent 

Same result from polynomial detrending 

Yes 

Yes 

Yes 

No 


Parameter ranges give very similar results. 




Time in days 




Time in days Time in days 


Fig. 4.— Detrending with a parabola (left) and sliding median (right) of the last recorded transit of Kepler-264b at Barycentric Julian Date 
~ 1542 (MJD-2454833). Grey dots are excluded from the fit to protect the potential flux loss from the OSE (top). Bottom: The parabola 
result has an average normalized flux (excluding transit and potential exomoon time) at slightly above 1, which has to be normalized for 
every single transit individually, so that the flux for the time long before and after the transit is set to 1. Note that the results from both 
methods are very similar, which is a useful consistency check. 


with periods between 15d-80d. These were Kepler-231c 
(KOI-784.01), KOI-314.01, KOI 314.02, KOI-463.01 and 
KOI-189.01; all with null results as expected. In ad¬ 
dition, we selected mostly planets around M-dwarfs, as 
they exhibit deeper transit depths due to their smaller 
radii, plus some randomly chosen planets in order to ex¬ 
plore the parameter space of sensitivity. This doubles 
the sample of currently examined planets (Kipping et al. 

2015b). 

For the limits, we assumed a circular moon with zero 
inclination and eccentricity, on a medium-wide orbit of 
two planetary transit durations, which corresponds, on 
average, to a semi-major moon axis as 30% of the Hill 
radius. For every candidate, we have calculated the syn¬ 
thetic OSE light curve for such a moon, and varied only 


one parameter: Its radius, which affects the depth of the 
OSE curve. Then, we have compared the real flux per 
candidate, using the null hypothesis as described in crite¬ 
ria OSE1: A normalized, nominal flux with “no dip”, rep¬ 
resenting no moon flux loss, and tested with a two-sample 
mean-comparison test. Again, this underestimates sen¬ 
sitivity during ingress/egress, but we prefer to have the 
error on this side. 

We started our iterative search with a large (2R®) 
moon, which produces a deep OSE curve. For this initial 
configuration (and all following) we then performed the 
two-sample mean-comparison tests, asking whether the 
real data and the simulated OSE curve are significantly 
different (at the 2 a level). This was the case for all 56 
planets, therefore rejecting the presence of a (2 Rq) moon 
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Fig. 5.— Upper limits (2cr) of moon radii for a sample of 56 planets analyzed in-depth; and histogram of the distribution. 


for all planets in the sample. We have then gradually de¬ 
creased the tested moon radius (in 0.01R® steps) and 
repeated the tests, until the real data and the resulting 
OSE curve crossed our 2er threshold. This crossing is the 
result shown in Fig 5, giving the sensitivity at which a 
moon is just rejected with 2 a confidence in a two-sample 
mean-comparison test of the two datasets. 

We have cross-checked this method with our full signal- 
injection and retrieval process as detailed in section 5.8.2. 
In this section, we will perform a full numerical modeling 
of the curves, inject these data and retrieve the OSE 
curve. The sensitivities with both methods are identical, 
as is expected. Therefore, we judge our sensitivity limits 
to be robust. 

To clarify definitions, in this paper we define orbits 
with semi-major moon axes >50% of the Hill radius 
as “wide” orbits, and those >25% as “medium-wide”. 
Compared to the Galilean moon system around Jupiter, 
where the innermost moon Io and the outermost moon 
Callisto orbit Jupiter between 0.8% and 3.5% the Jovian 
Hill radius, these limits are actually very large. As can 
be seen in Fig 6, detection sensitivity drops for small or¬ 
bit moons, so that these cases will be harder to detect. 
Therefore, one can expect a strong detection bias in the 
present, and future work towards wider-orbit moons. 

We have used a custom median detrending for each 
candidate. Figure 5 shows the results, presenting 2cr 


(95% confidence interval) upper limits. In the best case, 
the OSE method is sensitive down to R ~ 0.32J? e , 
which is almost the size of Earth’s moon. In 6 of 56 
cases, the method is sensitive to a Ganymede-sized moon, 
in agreement with Heller (2014). We find a correlation 
of sensitivity and stellar brightness: A linear regression 
returns a slope >0 with significance p = 0.011, touching 
the 1% (3cr) significance level. Furthermore, the test by 
Breusch & Pagan (1979) for heteroskedasticity in this 
regression is also significant, with p = 0.001. As is vi¬ 
sually evident from Figure 5, the scatter in sensitivity 
increases for stars with lower apparent magnitude. We 
speculate that the true slope is not linear, but quadratic 
or logarithmic; however due to the small number of data 
points we will not perform an in-depth analysis in this 
paper. It is for future studies to determine the underly¬ 
ing parameter space. 

To illustrate the data quality and approach, Figure 7 
shows the null result for KOI-189.01, confirming the re¬ 
sults from Heller et al. (2014), and now giving a sen¬ 
sitivity limit. The 2tr upper limit in this case is a high 
l?(j < O.84i?0, suffering from the relatively dim host star 
(A"p=14.4). As mentioned, this KOI has been shown to 
be not a planet (but an eclipsing binary star), but for 
this illustration the cause for the transit dip does not 
matter. 

Furthermore, we have created numerical simulations 
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Time from mid-transit (days) 



Fig. 6 .— Top: Comparison of a single moon transit (solid line) 
and the corresponding OSE, for a wide (dashed) and a medium¬ 
wide (dotted) orbit. The radius is kept constant. The integral 
for all three curves must be identical. Bottom: OSE dips caused 
by different semi-major axes of a putative exomoon R^ = l.li ?0 
(grey lines) orbiting Kepler-241c (black line). Plot shows egress 
with moons of transit durations 0.4 - 0.6 days. The area under each 
moon integral is identical, but the percentages of flux loss occurring 
outside of planetary transit is different: Wider orbit moons are 
more easily detected. 



Fig. 7.— Null result (straight line) for KOI-189.01. A hypothet¬ 
ical OSE (dashed line) with R^ = 0.84 R® can be rejected with 
2cr confidence. This reflects the level of significance chosen for our 
sensitivity results. Dotted and square symbols show data from 
ingress and egress, respectively; data points with uncertainties are 
combined bins of 30min length. 


for a range of moon semi-major axes, in order to explore 
the parameter space with regards to detectability. We 
kept all parameters constant and only varied the semi¬ 
major axis. As can be seen in Figure 6, wider moon 
orbits result in longer transit durations (in the stacked 
OSE sense), but with lower transit depth. This is be¬ 
cause the total flux loss (in photons of all photometry) 
is always the same, so that the integral, i.e. the area 
under each curve is identical. To illustrate the effect, let 
us assume a moon with a very small semi-major axis just 
outside the Roche lobe. The corresponding OSE curve 
will then mostly overlap with the stacked planetary tran¬ 
sit. As we have argued in section 2, it is very hard to 
detect flux deltas during stacked planetary transit, due to 
limb darkening uncertainty, and the unknown planetary 
transit depth. Outside of planetary transit, however, the 
putative flux loss of a moon can be compared to the nom¬ 
inal, normalized flux. Thus, we argue that moons with 
wider orbits are more easily detected. We show the frac¬ 
tion of flux loss that occurs outside of planetary transit, 
for one specific example, in Figure 6 (lower panel). It 
is also worth mentioning that all but the smallest orbit 
moons exhibit a feature dubbed “right and left wings” 
by Heller (2014). This gives a certain shape to the OSE 
that makes it, when detected, easier to distinguish from 
a simple linear flux drop, e.g. due to stellar red noise. 

Depending on the orbit configuration, an unknown 
fraction of the signal coming from moon transits occurs 
during planetary transits, which is harder to detect. As 
the distribution of exomoons is unknown, the average of 
this fraction is also unknown; it can be high (e.g. 70%) if 
moons tend to have small orbits, or lower (e.g. 30%) for 
wider orbits. If no modeling for the OSE during plan¬ 
etary transit is made, as in this present work, this cer¬ 
tain fraction of the OSE signal is lost. For an ideal, 
co-planar case, and assuming such modeling can be done 
with future theoretical advances, up to 100% of the OSE 
signal can be retrieved. This puts the OSE method at 
an advantage, when compared to photodynamical mod¬ 
eling by Kipping et al. (2009): The OSE can lever¬ 
age stacking to reduce red stellar noise, which has been 
shown to be particularly useful for short-time (<10hrs) 
time-correlated noise by Hippke (2015). Photodynami¬ 
cal modeling, however, has to rely on individual transits, 
of which each is prone to time-correlated red noise. Of 
course, this applies only to moons where the OSE is sen¬ 
sitive to, i.e. the coplanar case, as discussed in section 2. 

We found six candidates with a significant flux loss, 
as defined in section 4 by using the two-sample mean- 
comparison test, and discuss these in-depth in section 5. 
Out of these, two or three are identified as likely false 
positives. For reference, we have also found 3 planets to 
exhibit a significant (95% confidence interval) flux gain , 
consistent with statistical expectations. 

4.5. Shape of the OSE 

The shape of the phase-folded orbital sampling effect 
varies mainly with the moon’s radius, its semi-major axis 
and the orbital configuration of the system. For a single- 
moon case, Figure 8 shows the three main cases and their 
OSE curves: A near co-planar configuration is the simple 
case, and can serve as the standard hypothesis. This is 
what we find among 75% of large solar system moons, as 
discussed in the introduction. It has a symmetrical shape 
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Fig. 8. — Top: Orbital configurations showing star-planet-moon: 
Co-planar, near-zero inclination (a); inclined (b) and eccentric (c) 
so that part of the moon transit is not observed (outside dashed 
horizontal lines). All three situations show the planetary mid¬ 
transit. In plots (a) and (c), the moon transits after the planet. In 
plot (b), the moon does not transit in this specific epoch. Individ¬ 
ual transits are almost always snapshots of a given orbital configu¬ 
ration: The moon orbits the planet, but on a much longer timescale 
(days) than the transit duration (hours). Bottom: Inclined orbits 
(dotted) are still symmetrical, but less deep. Eccentric orbits (al¬ 
most) lack flux loss on one side, depending on the eccentricity. 
Grey vertical lines show the planetary transit duration. 

varying in depth and width depending on the semi-major 
axis (compare also Figure 6). 

The second case is an inclined moon, tilted towards the 
observer. This configuration still results in a symmet¬ 
ric OSE, but reduced depth and thus detectability. The 
greatest difference in shape is during planetary transit, 
making this configuration hard to distinguish from the 


co-planar case (this is because the OSE is harder to ob¬ 
serve during planetary transit). 

A more severe case is eccentricity (or an inclination 
shifted laterally from the observer), as shown in the bot¬ 
tom panel. The resulting OSE is highly asymmetric, 
showing that cases with flux loss on mainly one side are 
not necessarily false positives. In practice, with noise 
added, one would find no dip on one side (ingress or 
egress), and a regular dip on the other side with the best 
possible Aep/er-quality data. 

These results can be reproduced with the interactive 
spreadsheet (section 5.8.1), by removing the data points 
(i.e. setting them to flux=l) from the numerical sam¬ 
pling which are are not observed due to inclination or 
eccentricity. This introduces a slight error for the eccen¬ 
tric case, as it does not account for the change in velocity 
of the moon. We encourage modelers to advance the the¬ 
ory in order to account for this effect. 

With Kepler -class photometry, we do not expect to be 
able to make significant distinctions between these cases. 
This might be possible with future instruments such as 
PLATO 2.0. However, in case of a moon found with 
other methods, for instance using photodynamical mod¬ 
eling, the postulated configuration can be cross-checked 
with its expected OSE curve. Both should agree, within 
the errors, to make it a consistent exomoon detection. 

5. RESULTS FOR THE APPLICATION OF THE 
FRAMEWORK TO INDIVIDUAL CANDIDATES 

Our in-depth analysis of 56 stars (as explained in sec¬ 
tion 4.4) gave six candidates worth further examination. 
The following sections will give a brief overview to these 
systems, and then proceed to the individual tests. 

5.1. Candidate summary 

We summarize the six exemplary candidates in Table 2 
and describe them in the following sections. Kepler- 
1026 and Kepler-202c are similar to KOI-367.01, as they 
show a flux loss following criteria OSE1 only when us¬ 
ing a median detrending, but not when using polynomial 
(parabola) detrending. This case will therefore be dis¬ 
cussed using KOI-367.01 as an example. 

5.1.1. Kepler-264b 

Validation of the planets b and c in the Kepler-264 
system was done by Rowe et al. (2014) using data for 
Q1-Q15. We have checked these results by using the 
PDCSAP data with subsequent median-filtering, as de¬ 
scribed in section 4.3. Then, we have fitted the standard 
Mandel & Agol (2002) transit model, as implemented by 
PyAstronomy 4 . The results for the orbital and transit 
parameters are the same, within the errors, as in Rowe 
et al. (2014). We have not searched for additional plan¬ 
ets, and have not tried to reproduce the false-positive 
framework of the original authors - our intention was to 
double-check the transit model result. 

We found an OSE-like signal for planet &, but also 
mention planet c in the following do give some context 
on this particular system. 

Both planets b and c are best described as warm/hot 
Neptunes. As can be seen in Figure 9, a single 

4 https://github.com/sczesla/PyAstronomy 
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TABLE 2 

Best-fit moon results 


System 

K P 

R*[R©] 

P P [days] 

Rp [77©] 

R« [77©] 

a & P [10 4 km] 

Reference 


Kepler-241b 

15.4 

0.668 ±0.289 

12.718092 ± 0.000035 

2.30 ± 1.03 

1.1 ± 0.1 

450 ± 90 

Rowe et al. 

(2014) 

Kepler-241c 

15.4 

0.668 ±0.289 

36.065978 ± 0.000133 

2.57 ±1.11 

1.4 ±0.1 

320 ± 60 

Rowe et al. 

(2014) 

Kepler-264b 

13.0 

1.550 ±0.310 

40.806230 ± 0.000540 

3.33 ± 0.74 

1.6 ±0.2 

660 ± 120 

Rowe et al. 

(2014) 

KOI-367.01 

11.1 

0.980 ±0.185 

31.578659 ±0.000007 

4.47 ± 0.55 

0.8 ±0.1 

150 ± 50 

Rowe et al. 

(2015), DV a 

Kepler-102e 

11.5 

0.740 ±0.020 

16.145700 ±0.000100 

2.22 ± 0.07 

0.5 ±0.1 

300 ± 40 

Marcy et al. 

(2014) 

Kepler-202c 

13.9 

0.667 ± 0.034 

16.282493 ± 0.000038 

1.85 ± 0.10 

0.7± 0.1 

480 ± 90 

Rowe et al. 

(2014) 

For reference: Ganymede has R(f = 

0.4177© 







a Kepler Data Validation Pipeline, http://exoplanetarchive.ipac.caltech.edu 


(non-stacked) transit of a hypothetical exomoon around 
Kepler-264b with 1.6i?® (90ppm) does not provide suf¬ 
ficiently low noise even for a detection of a large body. 
Following equation (2), the S/N during a single moon 
transit is ^2.4, which is too low to claim a detection at 
the traditional threshold for S/N of 7 (Jenkins et al. 
2002; Rowe et al. 2014) or even 10 (Fressin et al. 
2013). Due to this low S/N, we do not try to fit individ¬ 
ual moon transits, as is done in photodynamical model¬ 
ing. Consequently, Figure 9 should be understood only 
as an illustration. 

During the operation of the Kepler spacecraft, a to¬ 
tal of 36 transits occurred. We could use a total of 27 
transits, while 9 were lost to data missings. 

As in Rowe et al. (2014), we only find insignificant 
TTVs and TDVs on the order of a few minutes, for both 
planets b and c, as is expected in such relatively long- 
period configurations (Awiphan & Kerins 2013). We 
have tried the further processing described in this paper 
with and without adjusting to these insignificant varia¬ 
tions, and get identical results, both for the OSE and the 
SP method. Following results are thus derived from the 
linear ephemeris, neglecting TTVs and TDVs. 

As an additional test for errors, we have added a buffer 
time of 2 cr of the uncertainty of the transit duration be¬ 
fore and after the transit, so that an additional time of 
0.028d before and after transit was excluded from further 
analysis. This did not change the results significantly, so 
that in the following we only exclude the nominal un¬ 
certainty (0.014d) from the analysis. These TTV/TDV 
treatments are the same for all following candidates. 

5.1.2. Kepler-2flb and -c 

Two planets are known to orbit Kepler-241, a K-star 
with small radius (0.668 ± O.289i?0). They have periods 
of 12.7 ( b ) and 36.1 days (c), and radii of 2.3 ± 1.03 and 
2.57 ± l.lli? 0 , making them both Super-Earths. Inter¬ 
estingly, we have detected a flux loss for both planets, 
with b having a short period which might cause stabil¬ 
ity issues for moons. We have tested a wide variety of 
detrending parameters to check if both planets were af¬ 
fected by detrending issues, but found this star particu¬ 
larly simple to detrend: The PDCSAP flux shows only a 
weak (~ 0 . 1 %) variation, likely due to rotation and spots, 
on a timescale of > 10 days, which is easily and cleanly 
removed with a sliding median. Also, parabola detrend¬ 
ing gives virtually identical results. 

To derive limits for the Hill radii of these planets, we 
have taken the values for the stellar radius (0.668 ±0.289 
Rq) and density (p* = 3.257 ± 1.171g cm -3 ) from Rowe 
et al. (2014). This results in a best-fit stellar mass of 
0.69M 0 . For rocky planets with a density of 5g cm -3 , 
the Hill radii would be 409,000km (b) and 901,000km 



Time in days (BJD+) 

Fig. 9.— Single transit of Kepler-264b at BJD=1012.6. A 
potential transit curve of a hypothetical ultra-large moon (1.6R©) 
is shown before planetary ingress, with a flux of -90ppm. Such 
a small signal cannot be reliably detected in a single transit 
observation. Binned data (dots with error bars) are only for visual 
clarity. 

(c). For a lower density of 2g cm^ 3 , the Hill radii would 
be smaller, 301,000km (b) and 665,000km (c). Conse¬ 
quently, one might argue that a potential exomoon orbit¬ 
ing 241b, in a distance of 450,000km as listed in Table 2, 
might be in tension with Hill stability, when neglecting 
stellar uncertainties. For planet c, no such conclusion can 
be made. In these cases, spectroscopy and radial-velocity 
data would help to reduce the stellar uncertainties. 

5.1.3. KOI-367.01 

KOI-367.01 is an unconfirmed planet candidate, with 
the risk of being a false positive. We have included 
this candidate due to its significant OSE-like signal, 
and the high photometric quality caused by the bright 
( K P = 11.1) host star, likely a G-star (0.98 ± 0.185 Rq) 
like our Sun. No other planets are known to orbit this 
star. Candidate .01 is Neptune-sized (4.47±0.55i?®) on 
a 31.6 day orbit. 

5.2. Results for criteria OSE1: Significance 

The significance of the dips before ingress and after 
egress can be measured easily, with the unknown factors 
being the duration of the dip, which depends on the or¬ 
bital configuration of the rnoon(s); and the transit depth. 
After preliminary estimation of its duration, we propose 
two tests: One for the full length, and one for the half 
closer to planetary transit. This is useful because the 
OSE is stronger closer to planetary transit, allowing for 
a more secure detection. We suggest that the shorter 
test must be passed for ingress and egress side, and the 
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longer test be a bonus. 

For simplicity, it can also be useful to test for a bin 
width of one (and: one-half) planetary transit duration, 
as this covers the majority of moon configurations. In¬ 
deed, for our example case of Kepler-264b, we get an es¬ 
timated flux loss starting at ~ |0.8|d from mid-transit 
(transit begins/ends at |0.297|d), so that the out-of- 
transit dip duration is ~ |0.5|d, roughly equivalent to 
the planetary transit duration (0.594d). 

For the data as presented in Figure 10, results are 
virtually identical for both detrending methods. We 
get a significant flux loss during ingress (2.0a), egress 
(4.8cr) and combined (4.8er), measured as the mean of 
the first half transit duration before (after) planetary 
transit. When taking the bin width as one full transit 
duration, we get an insignificant flux loss during ingress 
(1.2cr) and still significant flux loss at egress (4.9er). This 
means that the required test passed, and the optional 
test failed (marginally). 

The results for the other candidates are summarized in 
Table 3. Every one of the others fails one or more tests, 
but all are significant for the data at ingress and egress 
binned together. 

5.3. Results for criteria OSE2: Slope 

From the sampling of the orbital sampling effect, the 
flux loss must increase towards planetary transit. This 
can be tested with a linear regression and t-tests (or F- 
tests) for different folded time durations, as explained in 
the previous section. For Kepler-264b, the slope parame¬ 
ters before ingress (3.1cr) and after egress (2.1cr) are both 
inclined towards planetary transit, and significant. The 
test fails for the egress side of Kepler-241b and KOI- 
367.01. For the ingress side of Kepler-241c, we get a 
marginal pass. The other sides and candidates pass the 
tests. The power of the individual test result should not 
be overestimated, as the photometry is already close to 
its limit. 

5.4. Results for criteria OSE3: Stacking 

During the examination of other candidate exomoons, 
we discovered some examples were only one or a few tran¬ 
sits caused effects mimicking OSE and SP. One such case 
is Kepler-420b, where the 12 th transit at BJD~2455960 
was affected by thermal changes on the telescope (San- 
terne et al. 2014). This can be tested by splitting the 
data in segments, and analyzing these as described above. 
If only part of the data seems to cause OSE and SP, it 
must be regarded as a spurious result. 

Following the idea of Kipping et al. (2014) to use only 
part of the data and check the results versus the other 
chunk, we have created a similar test. We ask whether 
a randomly chosen subsection of the data is consistent 
(within the errors) with the total data, so that most, 
and not just a few data segments contain the potential 
exomoon signal. This test was repeated 100 times on 
randomly chosen parts of the data. For our candidates, 
we find that any 50% of the data give the same result (a 
flux loss both during ingress and egress), albeit with low¬ 
ered significance due to the lower number of data points. 
Inversely, we judge that ~ 50% of the data are required 
for the results described here, and it is irrelevant which 
half is used. 



0.9992 4-- Hr --T - 

-3-2-10123 

Time from mid-transit (days) 



1.0002 


1.0001 



0.9998 - T --i- T 

0 0.2 0.4 0.6 0.8 1 

Time from mid-transit (days) 

Fig. 10.— Kepler-264b phase-folded flux for a time of 3 days 
before and after planetary transit (top), and zoom into the region 
of one day before and after (middle). Grey dots are LC data, 
squares represent 20 LC bins. The bottom plot shows a symmetri¬ 
cal fold, with circles (squares) representing ingress (egress) 20 LC 
bins. Dashed line shows a least-squares linear fit, dottled line a 
numerical OSE fit (see also Figure 14), and the vertical line the 
planetary transit. 

In principal, OSE3 and Cl ask the same question. In 
this paper, test OSE3 focuses mainly on the detection of 
outliers, while Cl will check the actual build-up of the 
OSE in-depth, including numerical simulations. 

5.5. Results for criteria OSEf: Uniqueness 
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TABLE 3 

Test result summary for candidates, at 2 a (95%) significance 


Test 

241b 

241c 

264b 

367.01 

OSE1: Significance (l,r,b) 

/(X) % 

(X) // 

/// 

/// 

OSE2: Slope (l,r) 

/X 

(/) / 

yy 

/X 

OSE3: Stacking 

/ 

/ 

y 

/ 

OSE4: Uniqueness (a,b) 

XX 

XX 

(/) y 

XX 

OSE5: Star-spots 

X 

X 

y 

/ 

OSE6: Rings 

/ 

/ 

y 

/ 

OSE7: Plausibility 

/ 

/ 

7 

/ 

Cl: Temporal spread (OSE) 

/ 

/ 

/ 

/ 


Legend: •''’Pass; XFail; (X),(X) Marginal or partial fail/pass 


Ideally, there should be no other dips in the dataset 
that are more prominent than those before ingress and 
after egress. This criteria is closely connected with the 
actual measure of significance of such dips, but it is not 
formally the same: At any given significance level, it 
can occur solely from noise that additional dips show up, 
dips even deeper than the ones caused by the OSE. In 
a white-noise world, one could simply set a high enough 
significance limit to restrict this - and would, in return, 
loose a few moon detections to false negative results. In 
a red-noise world, however, with data coming from in¬ 
struments with trends, such is unwise, as it can elimi¬ 
nate valid information because of trends that might be 
far away temporally (and thus irrelevant, because un¬ 
connected to the signal in question). To be precise, a 
signal can be perfectly valid and should not be discarded 
because of an instrument malfunction that occurred, for 
example, only once and only in a distant phase-folded 
time. 

We suggest two tests for the uniqueness of the dips in 
a given dataset, one hard test and one soft test. The 
hard test demands that one cannot shift the (virtual) 
folded transit time, and then still have a significant flux 
loss before ingress and after egress. To test this, one 
plots a phase-folded flux graph that bins the flux to a 
useful duration, e.g. one transit duration (of course this 
choice depends on the putative orbit axis). In the default 
configuration, one gets a dip in the first bin before and 
after transit. The test demands that no significant dip 
occurs at bin n+1 and bin n— 1, with a virtual transit in 
a sliding bin n. As can be checked in Figure 11, Kepler- 
264b passes this test. 

The soft test is that the two binned dips before and 
after the (real) transit are the two largest dips that occur 
in the whole dataset. For Kepler-264b, this test is also 
successful, albeit only marginally. If this test fails, we 
recommend to plot a temporal map (like the river-plot, 
Figure 20) of the relevant dips. If single outliers cause 
other, deeper dips, we would judge it fair to remove these 
(few) outliers as the likely cause. 

Unfortunately, the test is less useful for short-duration 
transits on long periods: For a hypothetical moon around 
Kepler-102e, the bin width is ~0.05d during a period of 
16.Id, giving 322 bins to compare. Also, it is tedious 
to check and remove numerous outliers that naturally 
occur from noise (16 expected at the 95% significance 
level). Consequently, the other candidates fail this crite¬ 
ria without such clean-up. We suggest to skip the test 
for this category of planets. 


5.6. Results for criteria OSE5: Star-spots 



Phase-folded time in days before and after mid-transit 

Fig. 11. — Phase-folded flux for Kepler-264b in bins of one transit 
duration, for the complete dataset. Planetary transit is between 
the grey lines. The two neighboring bins before ingress and after 
egress (black dots) have the highest flux loss of all bins, although 
the bin before ingress only marginally. Care must be taken that 
the bin width is adjusted so that no planetary flux loss leaks into 
the other bins in question. The bin during planetary transit is not 
shown for clarity (at ~300ppm, it would be four times the diagram 
height downwards). 

We recommend the use of the normalized PDC-SAP 
data to search for the stellar rotation, as it effectively 
removes non-astrophysical systematics in the photome¬ 
try while leaving the stellar variability intact (Smith et 
al. 2012). Of course, the transit events need to be re¬ 
moved. We computed Lomb-Scargle periodograms and 
Fourier transforms for our candidates. For Kepler-264b, 
we found a consistent highest power peak at a period of 
11.1±0.07 days (with the uncertainty being the full width 
of the peak at half maximum). This peak is accompanied 
by several significant sidelobes at periods between 10.5 
and 12 days (Figure 12), as is expected for an activity 
behavior reminiscent to that of our sun, where differen¬ 
tial rotation causes a range of periods from 25 days at 
the equator to 34 days at the poles. 

The OSE might be imitated by stellar spot crossings, 
in case the “circumstellar orbital plane of the planet- 
satellite system were substantially inclined against the 
stellar equator (...) Such a geometry would strongly 
hamper exomoon detections via their photometric OSE.” 
(Heller 2014). In general, a stellar rotation period with 
an integer multiple of the planetary period would most 
strongly affect the OSE. In such a configuration, star- 
spots would re-occur at the same frequency as the plan¬ 
etary transit, creating dips in a phase-folded plot that 
might lie just before or after transit, resembling the OSE. 
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Fig. 12.— Periodogram for 264b, showing the strongest peak at 11.Id (left), likely the rotational period with an amplitude of 81ppm. 
Zoom (right) shows sidelobes from differential rotation, a strong indicator for stellar rotation. 


In the case of Kepler-264b, however, the closest inte¬ 
ger multiple, 4 x ll.ld=44.4d, is too far off to the pe¬ 
riod of planet b (40.8d). We find it therefore unlikely 
that the OSE is mimicked by star spots. The same is 
true for Kepler-102e. For Kepler-241c, we find several 
peaks, the strongest at 12.9d and 19.Id. Smaller side- 
lobes are around one half, and one third of the planetary 
period (36.Id). In this case, it is irrelevant what the 
stellar rotation period in fact is: There is the possibility 
that star-spots cause the OSE for planets orbiting this 
star, and thus the test is considered to be failed for both 
Kepler-241b and c. 

5.7. Results for criteria OSE6: Rings 

Rings can form when a body enters inside a planet’s 
Roche radius, where the body is ripped apart by tidal 
forces and its particles become a planetary ring. The 
Roche radius can be calculated as 2ARp(Dp/ Dg) 1 / 3 , 
with D denoting the relative densities of planet and satel¬ 
lite. For plausible configurations using the values from 
section 5.8.5 for Kepler-264b, the Roche lobe for a plan¬ 
etary density of 2.9g cm -3 is ~50,000km, an order of 
magnitude smaller than the putative moon orbit. For 
lower densities, and for the other candidates which have 
smaller radii, the Roche lobe is even smaller. 

A simplified search for Saturnian rings has been per¬ 
formed by Tusnski & Valio (2013), using synthetic Ke¬ 
pler data and a dark Saturn-model. Rings, however, also 
produce diffractive forward-scattering and consequently 
a flux gain , which helps to distinguish them from moons. 
In a more refined simulation, Hippke (2015) examined 
our Saturn transiting our quiet Sun through a simulated 
PLATO 2.0 instrument (see their Figure 9). The study 
used real solar data and the ring model from Barnes & 
Fortney (2004), together with calibration data for Sat¬ 
urn’s rings from the 1989 occultation of 28 Sgr by Saturn 
(French & Nicholson 2000). By signal injection and re¬ 
trieval, it was shown that both the flux gain and loss from 
ring occultations can be detected. Full modeling of such 
transits is a topic of recent interest, because rings cause 
an increase in transit depth that “may lead to misclassi- 
fication of ringed planetary candidates as false-positives 
and/or the underestimation of planetary density” (Zulu- 
aga et al. 2015). 


Depending on the ring composition (ice and/or rock) 
and density, forward-scattering might be strong or (very) 
weak. We encourage further study of this timely and 
interesting topic. 

5.8. Results for criteria OSE7: Plausibility and orbital 
parameters 

It is useful to check whether a putative star-planet- 
moon configuration is astrophysically possible, plausible 
and stable over the long time. While the possibility of 
any configuration is essential, the situation is less clear 
for the plausibility: Before the discovery of Hot Jupiters, 
such planets would have seemed implausible, and their 
discovery (Mayor & Queloz 1995) was challenged as they 
were found to be incompatible with theories of planetary 
formation (Rasio et al. 1996). 

The same is true for the stability. While we should 
assume that most moons are stable over long (Gyr) 
times, there might be configurations for which this is 
not the case. One related example are Saturn’s rings, 
which would be observable with Kepler -class photometry 
(Barnes & Fortney 2004), and are known to be unsta¬ 
ble on timescales of <100 Myr (Dougherty et al. 2009). 
Thus, instability should raise strong doubts about the 
presence of a moon (and less so about the presence of a 
ring), but does not proof their non-existence. 

We suggest to derive estimates for putative planet- 
moon configurations, as will be explained in the following 
section, and then to assess the system. 

5.8.1. Fitting the data to the OSE 

A fit of the data to various exomoon parameters can be 
done using the analytical equations from Heller (2014) 
with an iterative % 2 minimization. The parameter space 
of such a Monte-Carlo run can also include the test for 
multiple moons versus a single moon. With Kepler-data,, 
however, we judge the differences (a few ppm) as too 
small for a try. 

Alternatively, one can create numerical simulations, ei¬ 
ther on picture/pixel level, or on photometric level, as 
we have done. The principle of stacking is shown in Fig¬ 
ure 1, and we make our software template available for 
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Fig. 13.— Effects of limb-darkening on the OSE. Left: Synthetic transit shape of Earth’s moon transiting our Sun (black line, 6000K). 
While keeping all else identical, we have changed only the quadratic limb darkening parameters to match temperatures of 4000K (red) and 
8000K (blue). Right: The resulting OSE for these three transit curves, shown as a zoom into egress. The differences are clearly visible, 
but very small compared to expected Kepler noise levels. 
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Fig. 14.— OSE fit for Kepler-264b with lcr uncertainties. Dashed line is a numerical OSE fit. Left: Separate data for ingress and egress. 
Right: Binned together assuming symmetry. Best fit gives R^ = 1.6 zb O.2R0, semi-major axis 660,000km. 
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Fig. 15.— OSE fits for Kepler-241b, comparing two detrending options. Same scales for comparison. Left: Median filtering, giving best 
fit = 0.8 zb O.IR 0 , semi-major axis 450,000km. Right: Parabola detrending, giving best fit R ^ = 1.1 zb O.IR 0 , same semi-major axis. 
This shows the uncertainty which is caused by the choice of detrending. 
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the interested reader ’. In this paper, we have only fit¬ 
ted single moons on circular, symmetrical cases. Using 
the method as demonstrated in the spreadsheet, we have 
numerically sampled OSE curves using 10,000 samples, 
and a time resolution of lmin. For every system, we have 
created these curves for varying moon radii (in 0.01 R® 
steps) and semi-major axes (in 10,000km steps). This 
results in a large (10 5 ) number of possible solutions. To 
find the best fit, we have used the unbinned data, and 
performed a least-squares fit to the collection of synthetic 
curves, choosing the one with the smallest squared resid¬ 
uals as the best fit. To determine uncertainties, we have 
adopted the full width at half mean. 

This method introduces two slight errors. It adopts the 
transit curve (and thus transit duration) of the planet for 
the sampling of the moon OSE. This neglects the differ¬ 
ence in transit duration due to the smaller body, and 
thus introduced a slight error (~1%). It has, however, 
the advantage to naturally account for the best-fit limb 
darkening, as this is included in the best-fit planet tran¬ 
sit. The second error is that in the perfect sky co-planar 
configuration, a small number (<6.4%, see section 5.9.1.) 
of mutual planet-moon eclipses occurs, which the sampler 
ignores. This could be accounted for by removing an es¬ 
timated fraction (e.g. 3%) of samples. To keep this work 
simple, we have neglected the slight error. Note that 
there are near-sky-coplanar configurations (towards in¬ 
clination), where no mutual eclipses occur. So this choice 
is in fact one possible configuration. 

We have estimated the effect of an error in limb dark¬ 
ening values, by calculating different transit curves, and 
resulting OSEs, for various limb darkening parameters 
(and keeping all else equal). For this, we calculated syn¬ 
thetic light curves for a Sun-Earth-Moon configuration 
using PyAstronomy , and varied only the temperature of 
the star. While the uncertainty in stellar temperature 
is 'MOOK for the candidates in this paper, we have used 
values of 4000K, 6000K (approximately our Sun) and 
8000K. Obviously, for a star of the size of our Sun, the 
temperature range cannot be that large. In order to ex¬ 
plore the extreme ends of limb darkening, however, we 
have chosen these values. The corresponding coefficients 
are taken from Claret & Bloemen (2011). Afterwards, 
we calculated the corresponding OSE. As can be seen in 
Figure 13, the effect is small, but clearly visible. For a 
smaller error in temperature, e.g. the typical uncertainty 
of 100K, the effect on the OSE is very small. There¬ 
fore, we can safely ignore limb darkening uncertainties, 
and continue with the best-fit values from the planetary 
transit models. 

We have performed these simulations for all four sys¬ 
tems in question. The best-fit results are shown for 
Kepler-264b (Figure 14), Kepler-241b (Figure 15) and 
c (Figure 16), and for KOT367-01 in Figure 17. 

In addition, we have performed a signal-injection and 
retrieval, as will be explained in the following section. 

5.8.2. Signal injection and retrieval 

A useful test is to use real data (including its red noise), 
without any transit events, and inject simulated moon 
(and planet) transits into it. Afterwards, both results 

5 http://jaekle.info/ose/ 
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Fig. 16. — OSE fit for Kepler-241c using median filtering. In 
this case, the OSE has a distinctively different shape at planetary 
mid-transit, due to the smaller semi-major axis. 

can be compared; they should be consistent within the 
errors. 

We have performed this for Kepler-264b, using the 
same number of transits (26) and randomly chosen plain 
time segments of Kepler-264b data. Into these data, we 
have injected the moon dips described in the previous 
section. Afterwards, we have retrieved and folded them 
to the same OSE. The result is virtually identical to the 
real data, and visually indistinguishable from Figure 14. 
We have repeated this test 10 times using different seg¬ 
ments of data, and all look very similar - some a bit 
above in flux compared to the real result, some a bit be¬ 
low. This is of course no proof that a detection is real, 
but it is a necessary test to check that a detection can 
be possible with the given data quality. Clearly, the real 
Kepler-264b data is sufficient for a 1.6 R® exomoon de¬ 
tection. 

It would be beneficial to repeat this test for all can¬ 
didates in this work. However, the signal-injection and 
retrieval has proven very time consuming. One needs to 
generate the best-fit transit shapes for planet and moon, 
select appropriate time segments, scale all values and 
inject the data. Then, all transits need to be checked 
manually: Are there any gaps in the dataset that re¬ 
duce the number of transits required? As the number of 
transits should be the same in the real data and in the 
injected data, one needs to manually shift the plain data 
until this is resolved. Afterwards, the full OSE find-and- 
retrieve is required. Finally, we recommend to repeat this 
procedure several times, as in Monte-Carlo sampling, to 
avoid choosing “better” or “worse” plain data (in terms 
of noise) than the original. We encourage the develop¬ 
ment of a software tool for this task, which will be valu¬ 
able for future OSE validations. 

5.8.3. Plausibility of the OSE fits 

From the results shown in Table 2, it is apparent that 
the putative total moon radii are all larger than single 
solar-system moons. It is useful to put these results in 
context. We will take Kepler-264b as an example, as this 
candidate has the largest putative moon. 

Assuming a solar radius of R* = 1.55 Rq for Kepler- 
264b and a planetary radius Rp = 3.3±0.74R®, the total 
exomoon radii would then be ~10,000km (1.6 R®) on a 
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Fig. 17.— KOI367-01. Left: Ingress and egress shown separately for illustration. On the egress side, the slope test (OSE2) has failed, 
as the flux should decrease towards planetary transit. Right: Combined data giving a best fit for R.q = 0.8 ± O.li?®, semi-major axis 
150,000km. 


semi-major axis of asp/Rp = 31±11 planetary radii, i.e. 
~660,000km. For comparison, the Earth/Moon orbit is 
at 384,400km; Jupiter/Ganymede at 1,070,000km. The 
uncertainty of the semi-major moon axis is dominated 
by the planetary impact parameter b = 0.93+gg4 and the 
host star’s radius, f?* = 1.55±0.31J?q, which is reflected 
in the planetary radius uncertainty, 3.3±0.74R®. It is 
important to note that these parameters quantify the 
total OSE, and it is unclear whether this comes from a 
single moon or multiple moons. 

For comparison, Jupiter’s largest moons Ganymede 
(2,632 km, 5.6ppm), Callisto (2,410 km, 4.7ppm), Io 
(3,660 km, 2.7ppm) and Europa (1,560km, 2ppm) would 
account for a flux loss of 15ppm, that is ^50% of the 
required 31ppm, in case they were orbiting Kepler-264b 
on short orbits. Kepler-264b is almost (86%) the size of 
Neptune, thus making the argument for several moons 
more plausible. Figure 18 shows a sketch of the poten¬ 
tial exomoon size for the case of a single moon. 

Following formation theories, such large moons would 
likely not have formed in situ (Canup & Ward 2006), but 
be captured (Williams 2013). It has also be shown that 
Neptune-sized planets can capture Earth-sized moons 
(Porter & Grundy 2011). 

5.8.4. Shape of the OSE 

The data quality is not sufficient for a detection of the 
“wings”, where moon egress/ingress takes place just be¬ 
fore/after planetary transit. This occurs for single moons 
on wide orbits. A detection of this feature might be a 
powerful tool used with better photometry in the future: 
It requires the data to not only exhibit some dip, but fol¬ 
low a distinct dip shape. Judging from Figure 14, ~2-3 x 
lower noise is required for a 2tr detection, indicating that 
such a detection is possible with PLATO Hd-class pho¬ 
tometry. Indeed, in a recent study by Hippke (2015), 
this was demonstrated by signal-injection and retrieval 
for a 2R® planet with a 0.4R® (Ganymede-sized) moon 
orbiting a 0.5R e M-dwarf. 

5.8.5. Orbital stability 

Kepler-264b is the most critical case in terms of stabil¬ 
ity, as its fit gives the largest radius and the widest orbit. 



Fig. 18. — Size comparison for Kepler-264 planets a and b, to¬ 
gether with potential exomoon .01. The bottom row shows some 
solar system objects for comparison. Distances are not to scale. 


Therefore, we discuss orbital stability for this example. 

Neglecting a potential eccentricity, the Hill sphere can 
be calculated as r = a <(/ M p /3A7g ■ where r is the Hill 
radius and a the distance between star and planet. As 
the mass of the planet cannot be measured from the 
photometry directly, we can only derive limits for the 
planetary density with regards to the Hill sphere. For a 
Hill radius of 660,000km, the planetary density needs to 
exceed 1.75g cm' 3 , which is slightly above the density 
of Neptune (1.64g cm -3 ). For a heavier composition of 
e.g. 2.9g cm -3 , as in the comparably sized Kepler-20c 
(Gautier et al. 2012), the Hill sphere would extend to 
1,700,000km, so that the potential exomoon would lie at 
39% of the Hill radius. 

Following Kepler’s 3rd law, one can derive limits on 
the planetary mass due to the presence of a moon. Pre¬ 
vious studies proposed transit timing variation (Kipping 
2010) and the OSE (Heller 2014) to measure the mass of 
a planet, using its moon. There is a third test: Hill sta¬ 
bility. Depending on the mass (density) of the planet, 
264b’s orbital period for a 660,000km orbit would be 
in between 12.9 days (2.9g cm' 3 ) and 17.1 days (1.64g 
cm -3 ). This shows that exomoons can reveal a planet’s 
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mass from Hill stability, using only photometry. 

Retrograde moons are argued to be stable up to nearly 
(93%) of the Hill radius (Domingos et al. 2006). On 
the other hand, prograde moons (which provide the vast 
majority among the principal solar system moons) are 
only stable out to about 0.5 Hill radii. A prograde moon 
at 39% of the Hill radius would be at the edge of orbital 
stability. And, second, it would be much farther away 
from its planet than any major moon in the solar system 
(in terms of Hill radii, not absolute distances). So the 
moon interpretation in this case is doubtful. 

To sum up, an exomoon around Kepler-264b could be 
stable for the (less common) retrograde case, and if the 
planet’s density exceeds that of Neptune, which is plau¬ 
sible for a planetary radius of 3.3 R®. 

5.9. Results for criteria Cl: Temporal spread 
5.9.1. River-plot 

In the case of one moon, it should only block light on 
some transits (not all), and only on one side (ingress or 
egress) per transit. Unfortunately, the photometry is at 
the very limit for exomoon dips on the ppm level in a 
single transit. We suggest to make a “river plot” (Carter 
at al. 2012; Nesvorny et al. 2013), and show this for 
Kepler-264b in Figure 19. In the stretched version (right 
panel), one tends to believe that the dips are indeed one¬ 
sided. For instance, the first row (period) shows a flux 
loss before ingress, while the second has only a flux loss 
after egress. Also, a few transits seem to lack any moon 
dip, as expected for those cases when the moon is in front 
or behind the planet, or not transiting at all (e.g., due 
to a non-zero inclination). However, there are also rows 
that show features both before ingress and after egress 
(e.g. row 5). What is more, some rows show features 
longer than possible for a moon transit (e.g. row 13). 

Such noise features are also present in a null test (Fig¬ 
ure 20) for the phase folded time -11 days before plane¬ 
tary transit, where the third largest dip (after ingress and 
egress) in the dataset occurs, caused by noise. Without 
photodynamical modeling, which is beyond the scope of 
this paper, we find it hard to decide which of these fea¬ 
tures have an individual astrophysical cause, and which 
are instrumental trends or simple noise. 

As can be seen in Figure 19, 8 out of 27 transits (30%) 
do not show a potential moon transit outside the plan¬ 
etary transit. This can be compared to the theoretical 
upper limit of ~6.4% (~2 transits), as derived by Heller 
(2012) using the Roche stability criterion. Thus, the 
missing 6 transit events indicate that the cause for the 
dips is not a moon. To have a 30% probability of a moon 
eclipsing its planet in the given star-planet-configuration, 
the moon would need to be well within the planetary 
Roche radius, breaking the moon apart. Moreover, if the 
moon’s structure would be stable due to a ridiculously 
high moon density, then the moon would actually show 
multiple eclipses during one stellar transit (Heller & Al¬ 
brecht 2014). There are two ways around this problem, 
and both are problematic. The first is to assign the miss¬ 
ing transits to red stellar noise, as a single moon transit 
S/N is <3. This would however mean that the positive 
events (flux loss events) could equally be caused by red 
noise, with the opposite sign. The second way around it 
is to assume a non-coplanar configuration, e.g. inclined 
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Fig. 19. — Left: River-plot for Kepler-264b. Each row is one 
transit (in between vertical lines); time runs from top to bottom. 
Darker colors are lower flux, whites in the left plot are missings. 
Both top plots show the same data, but with different shadings for 
clarity. On the right plot, it can be readily seen that the flux loss 
clusters before ingress and after egress. It is also evident that this 
flux loss is not located in only part of the data (e.g. in only one 
line, in only the first half), but is present all over the dataset. 
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Fig. 20. — Left: River plot null test for the phase folded time -11 
days before planetary transit, where the third largest dip (after 
ingress and egress) in the dataset occurs, caused by noise. Flux 
loss seems to concentrate on the first rows. Right: Same time and 
data, after injection of the transit dip. Flux loss is not apparent 
anymore, indicating a noise-only dip, as is expected at this time of 
no transit. 

(compare Figure 8). This explains the missing transits, 
but would further increase the (already large) moon ra¬ 
dius. 

To sum up, we judge the river plot (the individual tran¬ 
sit events) as rather disfavoring the moon hypothesis, 
without rejecting it completely. It is also worth noting 
that the OSE framework as introduced by Heller (2014) 
considers only phase-folded light curves. However, the 
examination and validation of individual, contributing 
transit-like signals can be a useful addition to validate or 
refute a moon hypothesis. 

5.9.2. Buildup-plot 




























On the detection of Exomoons 


19 


40 

20 

T 

Q. 

3 o 
o 

* -20 

a; 

•| -40 
J2 

3 

E 

5 - 60 
-80 

-100 


10 15 

Number of transits 


10 

0-- 

-10 
7-20 J 

Q. 

a. 

8 -30 . J ^.- -.- -.. -.i...< - 

_o 

x 

E -40 - 
-50 
-60 

-70 --- 

19 20 21 22 23 24 25 26 

Number of transits 



Number of transits 


Fig. 21.— Top: Real build-up for Kepler-264b flux loss over 
time. After a random episode, the OSE builds up from transit 
# 13 on and stabilizes at ~30ppm after transit #16. Uncertainties 
are 2a (95%). Middle: Monte-Carlo test asking for the average 
number of transits required (result: 21) for a detection at the 95% 
significance level. Bottom: Null test for the phase folded time -11 
days before planetary transit, where the third largest dip (after 
ingress and egress) in the dataset occurs, caused by instrumental 
and/or stellar noise. Flux loss is only apparent at the beginning of 
the dataset. After only 5 transits, no OSE can have build up; proof 
of a false positive dip. Also, as is expected for a random-walk of 
noise, the trend is likely to return to zero after long enough time. 


Another test should be made to answer the question 
about how precisely the OSE builds up over time. Due 
to the very nature of the additive sampling of the effect, 
which includes non-transits, the dip cannot be present 
after just a few transits. Such would indicate noise as 
the cause. On the other hand, the cumulative flux loss 
should build up within the first few 10s of transits. The 
exact number depends on the noise floor and frequency 
of moon transit, but should in general be between 10 
and 50. After the buildup, no further (additional) flux 
loss should appear, but the cumulative depth of the dip 
should stabilize at its final level. Additional data should 
then only decrease the uncertainties, but not deepen the 
dip. 

To test this, we find it useful to plot the number of 
transits versus the average flux loss, measured in an ad¬ 
equate bin before plus after planetary transit. For our 
example of Kepler-264b, we can see that the cumulative 
flux loss follows a random walk until transit #13 (Fig¬ 
ure 21). Afterwards, a dip begins in the cumulative data, 
becoming significantly negative from transit #17 on (the 
chart shows the 2cr (95%) uncertainties). After transit 
#19, the cumulative flux loss stabilizes at ^30ppm. 

For Kepler-241c, we find that the OSE stabilizes after 
transit #23 (Figure 22). The progression during the first 
few transits is skewed southwards due to a few outliers, 
but then stabilizes at the final level. 

Of course, the start and the end of the observations 
could have happened at other points in time. To test 
the general build-up of the OSE, we have performed a 
Monte-Carlo test for Kepler-264b with N = 26 x 10 6 re¬ 
alizations to answer the question of how significant an 
OSE would be, on average, after only having observed 
randomly chosen n of N =26 transits. Figure 21 shows 
the results for the 95% significance level. It can be seen 
that, on average, 21 transits are required for a marginal 
detection. A more refined test for this question would be 
a cumulative Bayes factor map (Gandolfi et al. 2015). 
From a purely statistical view, one could also ask for 
stationarity in the data, i.e. test whether the time series 
follows a random-walk (null result) or converges with a 
trend. To test this, we employed the Augmented Dickey- 
Fuller test (Said & Dickey 1984). The test rejects the 
random walk hypothesis for Kepler-241c at the 99.9%- 
level, and attributes an 80% probability for a trend in 
Kepler-264b. Our null test for a random walk was the 
same dip of Kepler-264b at the phase-folded of -lid, for 
which the test gives only a 10% probability to the ex¬ 
istence of a real trend, as is expected. Such unit root 
tests are known to benefit from longer time-series, and 
the results for n=26 and n=40 should only be regarded 
as indicators. 

Finally, we have repeated the tests for Kepler-241b and 
KOI-367.01 (Figure 23). For Kepler-241b, it can be seen 
that after ^30 transits, a 2<r flux loss appears. After ~60 
transits, it stabilizes at the final value. The behaviour for 
KOI-367.01 is more erratic; the significant dip after only 
~10 transits cannot be real due to insufficient sampling. 
This must be caused by noise. The next dip occurs after 
~30 transits, and remains at the same level until the end 
of the observations. 

To sum up, we interpret these results in favor of the 
OSE interpretation (over noise) for our candidates. 































































20 


Michael Hippke 


1.0020 

1.0015 

1.0010 

1.0005 



1.00000 

0.99995 

0.99990 

* 0.99985 

IZ 

| 0.99980 

_ro 

0) 

* 0.99975 
0.99970 
0.99965 
0.99960 


Number of transits 


10 15 20 25 

Number of transits 


Fig. 22.— Left: Individual flux measurements for Kepler-241c for each transit (taking the average flux for a time of 0.05d before ingress 
and after egress of each transit). Most periods are, by itself, consistent with no flux loss. However, it is apparent that the average is below 
nominal flux, indicative of another transiting body. Right: Cummulative flux loss over time. After a random episode, the OSE stabilizes 
after transit #23. 
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Fig. 23.— OSE buildup for Kepler-241b (left) and KOI-367.01 (right) with 2a uncertainties as grey shades. 


5.10. The scatter peak 

The SP method employs a sliding boxcar median of the 
folded lightcurve to subtract the average transit shape. 
This removes any flux loss from OSE and planetary tran¬ 
sit, and makes the SP an independent method. 

Unfortunately, we find that the trends in the Kepler 
data make the analysis of the SP highly problematic. 
The raw SAP flux data contains instrumental trends on 
the order of ~1%, rendering any scatter on ppm scale 
invisible. Clearly, detrending is required. The approach 
described in section 4.3, where the actual transit (plus 
a bit of data before and after the actual transit to ac¬ 
count for TTVs and TDVs, and to protect the moon 
dip) is masked, creates an artificial scatter peak in it¬ 
self. This comes from the simple fact that some area is 
not detrended as well, as it is excluded from the fit, and 
has thus higher noise. To check this, one can fold to a 
plain area of no transit (where no scatter peak can be), 
and detrending with excluding some data then promptly 
creates a scatter peak (Figure 24). 

For quiet stars, the Bayesian Maximum A Posteriori 
(MAP) approach (as explained in section 4.3) might be 


useful, as it performs the detrending using other reference 
stars. For our example of Kepler-264b, we are however 
left with noise from stellar spots on the order of ~80ppm 
(section 5.6), which overshadows a 30ppm moon signal. 

We have considered two methods that can be applied 
in such cases. 

5.10.1. Subtraction of scatter template 

As the detrending creates a scatter peak anywhere in 
case of blinding the fit, this false positive peak can be 
created n times at plain phase fold times, and its aver¬ 
age height subtracted, like a template, from the time in 
question. In practice, unfortunately, the template shows 
a scatter increase of ~5-50%, while the real scatter peak 
within Kepler noise is expected on the order of ^1%. We 
find that after subtraction of the template, all results are 
spurious, and the method to be useless. 

5.10.2. Temporal cleanup 

If detrending is so problematic, one can try to avoid it 
and clean up the mess afterwards. This method uses the 
PDC-SAP data, without the instrumental trends but still 
containing occasional jumps from stellar activity. The 
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Fig. 24.— Median-filtered scatter for Kepler-264b with false¬ 
positive scatter peaks at > 8 a significance due to improper de¬ 
trending. 

idea is to create the scatter peak plot, and an additional 
map answering the question of its temporal origin (Fig¬ 
ure 25). This map can then be used to set a useful thresh¬ 
old, e.g. keeping the more quiet 80% of the data. Then, 
one can cleanup the scatter peak plot with this decision. 

For Kepler-264b, this method seems to work, as the 
scatter is relatively smooth afterwards. However, the 
noise level is still too high (at ~50ppm) for the detection 
of the scatter peak, which is consistent with numerical 
simulations done by Simon et al. (2012). Instead of 
19 periods short-cadence data, about 50 to 100 are re¬ 
quired. We conclude that no SP has been detected, for 
a lack of data, so that the test SP1 (significance) cannot 
be answered. The same is true for all other candidates. 
From our experience, the application of the scatter peak 
is not possible for stars that require heavy detrending 
on short (<10d) timescales. For longer timescales, very 
low amplitudes and very short transit durations, certain 
detrending methods might work, such as fitting polyno¬ 
mials or fully modeling the rotation with star-spots. The 
key is to exclude as little data as possible (during tran¬ 
sits) from the fits. Such modeling, however, is beyond 
the scope of this paper. 

5.11. Results for criteria SP2: Period shift and SP3: 

TTVs+TDVs 

There is one major other configuration that also causes 
a scatter peak: Imperfect transit folding. This can occur 
due to TTVs and TDVs, and also when using incorrect 
periods. One such case is Kepler-lld, where the pub¬ 
lished period is 22.6845±0.0009d (Lissauer et al. 2013). 
When using this period, the median-filtered phase fold 
shows clear signs of a folding error in the form of “wings” 
during ingress and egress (Figure 26). At first glance, 
this can be mistaken for a scatter peak, but the main 
difference is that it only appears on the time of ingress 
and egress, and not around mid-transit. When correcting 
the period to P=22.687d, the scatter peak is completely 
gone. This makes for a good test: Shifting the period 
should not decrease the scatter peak, i.e. the scatter 
peak should not be removable by “correcting” the pe¬ 
riod. If it does, the period is wrong, and the scatter 
peak irrelevant. 

Using Kepler-264b as an example, we have also tried 



Fig. 25.— Scattermap for Kepler-264b short cadence data. On 
the horizontal axis, a time from -lOd to +10d in the phase fold is 
shown, with a bin width of Id. On the vertical axis, the 19 periods 
of SC data are displayed. Upper panel shows raw data, lower panel 
after cleanup of the 16 largest outliers (in black color before, with 
crosses afterwards). 



Fig. 26. — Median-filtered scatter for Kepler-lld with false¬ 
positive scatter peaks at > 8 a significance due to an incorrect 
period. 
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shifting the period, and done this for ±0.1d, which is 
~190x the uncertainty of the period. Our step rate was 
O.OOOOOld. Around the best fit period, we found scat¬ 
ter peaks fluctuating by ±10%; an insignificant differ¬ 
ence. Further away (> |0.015|d), the scatter peak in¬ 
creases significantly due to sidelobes of the folding error 
(see Figure 26). We conclude that the period is, within 
the errors, correct. 

5.12. Results for criteria SPf: Plausibility and C2: 

Temporal spread 

Any detection of a scatter peak must be verified with 
numerical simulations, in order to assure that the peak 
height is in agreement with the expected value. This can 
be done as described in section 5.8.2, by injecting transits 
into real data (without transits) and retrieving them for 
comparison. 

Following criteria Cl, the data can be split the data in 
equally sized parts and performed the SP tests described 
above. For a lack of scatter peak detection, we have not 
pursued this task. 

6. DISCUSSION 

As summarized in Table 3, no OSE-like signal passes 
all tests. Nominally, Kepler-264b has the best result. 
However, its OSE fit gives the most extreme values 
for the moon radius ( R^ = 1.6 ± 0.2orbiting a 

3.33 ± 0.741?® planet), raising serious doubt about its 
plausibility. While likely physically possible, it is un¬ 
clear whether such a configuration can be real. On the 
plus side, different detrending methods give consistent 
results. 

While it seems clear that Kepler-264b shows an OSE- 
like signal in the phase-folded light curve, one might ar¬ 
gue that it is not caused by a moon. This poses the ques¬ 
tion, whether the feature is only due to red noise. As dis¬ 
cussed in section 4.1, we should expect ~3 false-positives 
among the 958 suitable planets, from the dip depth alone. 
We had also argued that the consequent tests and crite¬ 
ria should decrease this number. For Kepler-264b, the 
test results are mixed, and we can thus not exclude a 
false-positive from time-correlated noise. After spending 
much time with this candidate, and looking at the data 
in many ways, we like to believe that perhaps part of the 
signal is due to moon(s), in combination with red noise, 
but this should be considered speculative. 

The system of Kepler-241b and c offers more plausi¬ 
ble, but still very large putative moons (1.1 ±0.1i?® and 
1.4 ± O.li?®) orbiting Super-Earths. Both planets show 
limited significance of the individual dips (OSE1), and 
issues with the slopes of the dips (OSE2). On the other 
hand, both are easy to detrend and give similar results 
for all detrending methods. 

Kepler-102e and 202c show marginal significance of the 
individual dips, and both are vulnerable to detrending 
options, with parabola detrending giving better OSE fits. 
Kepler-102e raises doubts of star-spot induced flux vari¬ 
ations. 

For KOI-367.01 and Kepler-202c, the largest issue is 
certainly the fact that the parabola detrending gives a 
different result (no OSE), when compared to the median 
detrending. In this case, we have to assume a detrend¬ 
ing artefact. The same is true for Kepler-102e, which 


also suffers from doubts of star-spot induced flux varia¬ 
tions. On the other hand, it is interesting to note that 
planet e is the heaviest in the Kepler-102 system; how¬ 
ever it does not have the largest Hill radius: Using the 
data from Marcy et al. (2014), the Hill radius for 102e 
is 433,000km, but that of 102f is larger at 483,000km. 
Therefore, one might argue that 102f would be more 
suited to host a moon. Also, in a best-fit moon orbit 
around 102e, at 300,000km, only retrograde moons would 
be stable. What is more, moon formation through accre¬ 
tion will only produce very small moons - if any - around 
the known planets around Kepler-102. This is because 
the planets are sub-Neptune in mass and very close to 
their star. This also makes moon formation via capture 
very challenging (e.g. Williams (2013)). Consequently, 
a wide-orbit, large-radius moon around 102e is unlikely 
to exist. 

The general mix of test passes and failures makes it 
very difficult to consider any one signal the most promis¬ 
ing. From statistics, we have to assume that three of the 
six are false-positives, and tend towards rejecting Kepler- 
202c, Kepler-102e and KOI-367.01. 

For Kepler-241b and c, and Kepler-264b, we recom¬ 
mend follow-up analysis using photodynamical modeling. 
For future work, we have summarized our test recommen¬ 
dations in Table 4. 

Regarding the super-stack, we should note that stars 
and moons are of different sizes, so that our results can 
only be taken as a very rough estimate. It is also unclear 
whether the non-detection for periods < 35d comes from 
a real lack of moons, or from different sensitivities, noise 
or errors. We tend to interpret the result in the way that 
planets with P < 35 d have a lower probability for moons, 
and/or possess moons with smaller radii, so that the ef¬ 
fect becomes hidden in the noise. Further studies will be 
required in the future, and the present result should be 
taken with caution. It can however be interpreted as an 
indication for the existence of exomoons among planets 
with periods 35d < P < 80 d. 

Finally, it should be noted that exomoons can reveal a 
planet’s mass from Hill stability, using only photometry. 
Previous studies proposed transit timing variation (Kip¬ 
ping 2010) and the OSE (Heller 2014) to measure the 
mass of a planet, using its moon. To our knowledge, the 
Hill stability argument is new, and can be useful in the 
future as an additional test to constraint the masses of 
planets and moons. 

7. CONCLUSION 

“What is not disputed, is not particularly interesting” 6 
- as explained, we do not claim an exomoon detection in 
this paper, but only present methods and tests to con¬ 
sider them candidates. The detection of exomoons is at 
the limit of current photometry, and its methods and 
tests need to be checked, challenged and verified. The 
methods at work (OSE and SP) rely on phase-folded in¬ 
formation, so that the individual TTVs, TDVs and flux 
loss of the transits is lost. On the positive side, time- 
dependent red noise is reduced by stacking. A competing 
search strategy, as employed by Kipping et al. (2014), 

6 “Was nicht umstritten ist, ist auch nicht sonderlich interes- 
sant.” - Johann Wolfgang von Goethe (1749 - 1832), German poet 
and philosopher. 
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TABLE 4 

Recommended test assignement 


Test 


Statistics 

Plots 

OSE1 

Significance (l,r,b) 

t-tests 

Phase-folds (Figure 10) 

OSE2 

Slope (l,r) 

t-tests 

Phase-folds (Figure 10) 

OSE3 

Stacking 

Monte-Carlo 

River-plot (Figure 19), build-up plot (Figure 21, 22) 

OSE4 

Uniqueness (a,b) 

t-tests 

Phase-fold in bins (Figure 11) 

OSE5 

Star-spots 

Periodogram, Fourier 

Time-amplitude graph (Figure 12) 

OSE6 

Rings 

Density, Roche-radius 

- 

OSE7 

Plausibility 

- 

OSE-fit (Figure 14, transit plot (Figure 9) 

Cl: Temporal spread (OSE) 

Monte-Carlo, Dickey-Fuller 

River-plot (Figure 19), build-up plot (Figure 21) 

SP1: Significance 

t-test 

SP-plot (Figure 24, smoothed) 

SP2: Period shift 

- 

SP of phase-fold (Figure 26) 

SP3: TTVs+TDVs 

- 

SP of phase-fold (Figure 26) 

SP4: Plausibility 

Monte-Carlo (numerical) 

- 

C2: Temporal spread (SP) 

Monte-Carlo 

Scattermap (Figure 25) 


avoids phase-folding but instead uses a photodynamical 
approach and brute-force calculates and fits all possi¬ 
ble configurations. Such a search is beyond the scope of 
this paper, but could ultimately be used to (in)validate 
promising results from OSE and SP. As the photody¬ 
namical method is computationally expensive, OSE+SP 
can deliver candidates for further investigation. Every 
method has its advantages and disadvantages; its assets 
and drawbacks. On a side note, the OSE can readily de¬ 
tect and characterize multi-moon systems including the 
radii and orbits (given high-enough data quality), as suc¬ 
cessfully demonstrated by Heller (2014), which is not 
possible with the current photodynamical method. 

For the super-stack of all suitable planets of 35d < 
P < 80 d, we find an average flux loss of 6 ± 2 x 1CU 6 
(6ppm), corresponding to an average total exomoon ra¬ 
dius of 2120+37okm (0.33R®) for the average star radius 
of 1.24 Rq in this sample. The analysis is less sensitive 
to longer-period planets with four years of Kepler data, 
but there seems to be a hint of a lack of putative moon 
signal in planets with P < 35 d. 

At first glance, the high moon radii of the individual 
putative moons seem implausible, as there is no such 
large moon in our own solar system. However, our so¬ 
lar system might not be the norm - we have no Hot 
Jupiters, warm Neptunes, or Super-Earths in our solar 
system, and thus no reference for typical moons around 
such planets. Also, there is a strong selection bias, based 
on the detection limits (section 4.4), and in addition the 
simple fact that the strongest dips are most significant. 
The first moons to be found will likely be at the long 
(large/massive) end of exomoon distribution, as was the 
case for exoplanets. 

If all candidates shown in this work turn out to be noise 
only (or detrending errors), then we should still consider 
the OSE as a powerful tool for the detection of exo¬ 


moons or binary planets. Such a result would point to¬ 
wards necessary improvements in detrending algorithms, 
red noise treatment and modeling stellar noise, includ¬ 
ing spots and rotation. On the other hand, confirmation 
of any one candidate would make it an immensely use¬ 
ful method. In our solar system, a wide variety of large 
and small moons exists. This, together with formation 
theory, may lead to the conclusion that moons do exist 
outside of our solar system, i.e. rj Exo ^ > 0. The recent 
characterization of the ring system around J1407b (Ken¬ 
worthy & Mamajek 2015), potentially hosting moons, 
seems to support this. Consequently, a single exomoon 
confirmation would be a landmark in exoplanet science, 
and establish rj Exo > 0, with r] Exo <£ as the fraction of 
exoplanets which host at least one moon. In our solar 
system, six of eight planets host moons (but not Mer¬ 
cury and Venus), giving r) Solar ^ = 0.75 (although Mars’ 
small moons might be undetectable). As such, the poten¬ 
tial putative moons discussed in this paper should be re¬ 
garded only as candidates (or, if the more cautious reader 
prefers, signals ) worth further investigation. 
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